This presentation aims to demonstrate how housing costs vary depending on location and distance from the coast. The main features are housing_median_age (Years), median_income (USD), median_house_value (USD) and ocean_proximity.
The data analyzed is the California housing price dataset downloaded from kaggle. The dataset contains 20,640 observations and 10 features. The features are listed below:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import plotly.express as px
%matplotlib inline
# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")
# load in the dataset into a pandas dataframe
df = pd.read_csv('housing.csv')
df.sample(5)
| longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | ocean_proximity | |
|---|---|---|---|---|---|---|---|---|---|---|
| 5319 | -118.42 | 34.06 | 52.0 | 1881.0 | 334.0 | 640.0 | 321.0 | 6.8710 | 500001.0 | <1H OCEAN |
| 18277 | -122.07 | 37.35 | 35.0 | 1447.0 | 205.0 | 619.0 | 206.0 | 9.8144 | 500001.0 | <1H OCEAN |
| 2811 | -119.03 | 35.42 | 42.0 | 1705.0 | 418.0 | 905.0 | 393.0 | 1.6286 | 54600.0 | INLAND |
| 5523 | -118.36 | 33.98 | 40.0 | 1113.0 | 234.0 | 584.0 | 231.0 | 3.0927 | 316000.0 | <1H OCEAN |
| 6211 | -117.89 | 34.07 | 35.0 | 834.0 | 137.0 | 392.0 | 123.0 | 4.5179 | 218800.0 | <1H OCEAN |
# Drop null values
df.dropna(axis=0, inplace=True)
# Change the datatype of some features from `float` to `int`
obs = ['housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households']
for v in obs:
df[v] = df[v].astype('int')
There is a positive correlation between households income and house value as shown in the plot below:
# Scatter plot of house value and income
sb.scatterplot(data=df, x='median_income', y='median_house_value')
plt.xlabel('Income [Thousand USD]')
plt.ylabel('House Value [USD]')
plt.title('Income vs House Value');
The age of the House does not have any impact on the value placed on the house.
# Scatter plot of house age and house value
sb.scatterplot(data=df, x='housing_median_age', y='median_house_value')
plt.xlabel('Housing Age [Years]')
plt.ylabel('House Value [USD]')
plt.title('Housing Age vs House Value');
The location of the houses have impact on the value of the house. The closer they are to the Waters, the higher the value
fig = px.scatter_mapbox(df,
lat='latitude',
lon='longitude',
center={'lat':37.09, 'lon':-121},
height=600,
width=600,
color='median_house_value',
hover_data=['ocean_proximity'])
fig.update_layout(mapbox_style='open-street-map', title='Housing Price and Location')
fig.show()
Generate Slideshow: Once you're ready to generate your slideshow, use the
jupyter nbconvertcommand to generate the HTML slide show. . From the terminal or command line, use the following expression.
!jupyter nbconvert Part_II_slide_deck_template.ipynb --to slides --post serve --no-input --no-prompt
[NbConvertApp] Converting notebook Part_II_slide_deck_template.ipynb to slides
[NbConvertApp] Writing 1497128 bytes to Part_II_slide_deck_template.slides.html
[NbConvertApp] Redirecting reveal.js requests to https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.5.0
Traceback (most recent call last):
File "C:\Users\TIMOTHY\anaconda3\Scripts\jupyter-nbconvert-script.py", line 10, in <module>
sys.exit(main())
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\jupyter_core\application.py", line 264, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
app.start()
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\nbconvertapp.py", line 369, in start
self.convert_notebooks()
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\nbconvertapp.py", line 541, in convert_notebooks
self.convert_single_notebook(notebook_filename)
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\nbconvertapp.py", line 508, in convert_single_notebook
self.postprocess_single_notebook(write_results)
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\nbconvertapp.py", line 480, in postprocess_single_notebook
self.postprocessor(write_results)
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\postprocessors\base.py", line 28, in __call__
self.postprocess(input)
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\postprocessors\serve.py", line 90, in postprocess
http_server.listen(self.port, address=self.ip)
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\tornado\tcpserver.py", line 151, in listen
sockets = bind_sockets(port, address=address)
File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\tornado\netutil.py", line 161, in bind_sockets
sock.bind(sockaddr)
OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
This should open a tab in your web browser where you can scroll through your presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent slide. Make sure you remove all of the quote-formatted guide notes like this one before you finish your presentation! At last, you can stop the Kernel.